Selective spatio-temporal interest points
نویسندگان
چکیده
Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bagof-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques. 2011 Elsevier Inc. All rights reserved.
منابع مشابه
A Dense SURF and Triangulation Based Spatio-temporal Feature for Action Recognition
In this paper, we propose a novel method of extracting spatio-temporal features from videos. Given a video, we extract its features according to every set of N frames. The value of N is small enough to guarantee the temporal denseness of our features. For each frame set, we first extract dense SURF keypoints from its first frame. We then select points with the most likely dominant and reliable ...
متن کاملSpatio-Temporal Analysis of Drought Severity Using Drought Indices and Deterministic and Geostatistical Methods (Case Study: Zayandehroud River Basin)
Drought monitoring is a fundamental component of drought risk management. It is normally performed using various drought indices that are effectively continuous functions of rainfall and other hydrometeorological variables. In many instances, drought indices are used for monitoring purposes. Geostatistical methods allow the interpolation of spatially referenced data and the prediction of v...
متن کاملSpace-time Interest Points
Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we propose to extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for its interpretation. To detect spatio-...
متن کاملOn Space - Time Interest Points ∗
Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features capture interesting events in video and can be used for a compact representation and for interpretation of video data. To detect spatio-temporal events, we ...
متن کاملStudy of Human Action Recognition Based on Improved Spatio-temporal Features
Most of the existed action recognition methods mainly utilize spatio-temporal descriptors of single interest point ignoring their potential integral information, such as spatial distribution information. By combining local spatio-temporal feature and global positional distribution information (PDI) of interest points,a novel motion descriptor is proposed in this paper. The proposed method detec...
متن کاملA Spatio-temporal Extension of the SUSAN-Filter
This paper proposes a detector for spatio-temporal interest points. Interest point detection is a common technique in computer vision to extract salient regions and represent them by a single point for further processing. But while many algorithms exist for static images, there is hardly any method to obtain interest points from image sequences for the representation of salient motion. Here we ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Vision and Image Understanding
دوره 116 شماره
صفحات -
تاریخ انتشار 2012